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Abstract: Overstaffing production in underground coal mining is not convenient for daily 
management, and incomplete information of coal miners hinders the rescue process of firefighters 
during mine accidents. To address this safety sustainability issue, a novel face recognition method 
based on an improved multiscale neural network is proposed in this paper. A new depthwise seperable 
(DS)-inception block is designed and a joint supervised loss function based on center loss theory is 
developed to constructe a new multiscale model. The miniers can be recognized in the harsh 
underground environment during the life rescue. Experimental results show that the accuracy, recall 
and Fl-score indexes of the proposed method for the miner face recognition in the underground 
mining environment are 97.26%, 94.17% and 95.42%, respectively. Transfer model with joint 
supervised loss can effectively improve the recognition accuracy by about 0.5~1.5%. In addition, the 
average recognition accuracy of the proposed face recognition method achieves to 91.34% and the 
miss detection rate is less than 5% in the dugout tunnel of coal mine. 
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1. Introduction 


With the rapid improvement of coal mine informationization [1, 2], deep learning recognition 
technology [3, 4] has drawn considerable attention in underground mining safety assessment. 
Compared with the traditional miner management system [5, 6], the underground coal mine face 
recognition system [7] can provide timely, comprehensive and reliable miner identification 
information to the daily personnel management agencies of the mine, and provide the rescuers with 
the identity and regional location information of the trapped miners in the event of a mining accident. 
It plays an important role in curbing underground overcrowding production, strengthening mine 
management [8] and emergency rescue. However, the environment of underground coal mine not 


only has poor lighting conditions, but also the coal mining process is accompanied by a large amount 
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of dust, steam and coal ash, which make face recognition much more difficult in underground coal 
mine. To this avail, this paper aims to present an optimized face recognition algorithm to improve the 
accuracy of face recognition system in underground coal mine. 

The traditional representation method for obscured face recognition is the sparse representation 
method. Sparse representation classification [9] (SRC) represents high-dimensional image in a low- 
dimensional space and expects to use the minimum number of training samples while arriving at the 
minimum fitting error. He et al. [10] proposed a sparse representation algorithm based on the 
maximum entropy criterion, which can effectively cope with non-gaussian errors and outliers. Aiming 
to encode more structure information and discriminative information, Zheng et al. [11] integrated the 
adaptive learning weights into group sparse representation classifier (GSRC). A nuclear norm based 
matrix regression (NMR) method was proposed by Chen et al. [12] to alleviate the influence of 
contiguous occlusion on face recognition problems. However, the ability of traditional face 
recognition method to extract face features is limited, especially when there is occlusion in the face 
images, the occlusion features are mixed with normal features, which greatly reduces the 
effectiveness of face recognition. As a result, it is urgent to improve the accuracy rate of face 
recognition algorithms. 

With the development of deep learning [13, 14], the deepening of the network model has greatly 
improved its feature extraction capability. Wieczorek et al. [15] proposed a face detection model in 
risk situations based on lightweight convolution neural network. Combining the two tasks of directly 
extracting age-invariant features and synthesizing face features, Zhao et al. [16] proposed a deep age- 
invariant model (AIM) for face recognition in the wild. Based on single end-to-end deep neural 
network with strong anti-occlusion ability, a novel face recognition method was proposed by Qiu et 
al. [17] to discover the corrupted features and clean them. Aiming to further mitigate the resolution 
discrepancy due to the resolution limitations, Gao et al. [18] proposed a hierarchical deep CNN 
feature set-based representation learning for face recognition. However, the current researches on 
occluded face recognition mainly focus on the occlusion of specific regions [19-21], including eyes 
and mouth. Moreover, many studies found that most face recognition models can not achieve best 
results on all face datasets. 

In recent years, the progress of deep learning has greatly promoted the improvement of the 
adaptability of face recognition model. In order to alleviate the poor generalization of face recognition 


model, transfer learning [22] is applied to the training process of neural network. Cai et al. [23] 
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proposed a new generative adversarial network (OA-GAN) for natural face de-occlusion without an 
occlusion mask to overcome face natural occlusion tasks. For mitigating the negative effects of mask 
on face recognition, a new method was proposed by Zhang et al. [24] to improve the performance of 
masked face recognition. Shukla et al. [25] proposed a transfer learning using MobileNet V2 to solve 
the problem of face masked identification and verification. In order to effectively recognize face 
image taken in unrestricted environment, Tang et al. [26] proposed a face recognition algorithm based 
on depth map transfer learning. However, the current public face dataset lacks face samples from coal 
mine environment. Therefore, it is necessary to produce a miner face dataset to fine-tune the transfer 
model. 

In this paper, an improved face recognition method for underground coal dust occlusion based 
on transfer learning is proposed to solve the problem of random coal dust obscuration. Firstly, a novel 
DS-inception block is designed to reduce the amount of model parameters, and this work establishes 
a multiscale neural network named DSR-inception. Moreover, a joint supervised loss function based 
on center loss and softmax loss is proposed to adapt to face recognition classification task. In addition, 
experimental results show that the face recognition performance indexes of the proposed network in 
the homemade miner face dataset, such as accuracy, recall and F1-score, are superior to those of other 
classical face recognition models, and transfer model with joint supervised loss function can achieve 
the higher recognition accuracy. Lastly, the validity of the proposed face recognition algorithm is 
verified by industrial test in the dugout tunnel of coal mine. 

The remainder of this paper is organized as follows. In section 2, the proposed improved 
algorithm and model architecture are introduced. In section 3, comparative experiments of model 
transfer strategy are carried out and the effectiveness of the improved algorithm is verified in the self- 
made miner face dataset. In section 4, a face recognition system is built and occluded face recognition 
experiment is carried out in underground coal mine. Conclusions and future works are summarized 


in section 5. 
2. The proposed method 


2.1 Transfer learning 


Transfer learning uses the network trained by related tasks to apply to other tasks, which solves 
the problem of insufficient generalization ability of traditional machine learning. In the transfer 


learning, the domain D can be expressed by Equation (1): 
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D={x,P(X)} (1) 


where x is the feature space, X is the sample data point and X= {x1, x2, ..., Xn}, xi is the feature vector, 
P(X) is the marginal probability. 
The task T can be expressed by Equation (2): 


T={y,P(y|X)} (2) 


where y is the feature space and P(y | X) is the objective function. 
Based on the above two equations, transfer learning can be defined as: utilizing the knowledge 
of an existing task Ts in an existing domain Ds to solve the learning task 7; in the target domain D+ 


and achieve a better conditional probability distribution P(D:| Xr) of the target domain. 
2.2 Inception block and depthwise seperable convolution 


As shown in Appendix 1, inception block [27] is the basic convolutional block in GoogleNet 
[28], which is expanded in width by splitting the traditional convolutional kernel into different sized 
convolutional kernels. Inception block is able to split the single size convolutional kernel into 
convolutional kernels of different sizes, which enables the network to extract the features of the image 
more fully and take up less computational resources. The feature maps F' obtained after inception 


block can be mathematically expressed by Equation (3)-(7): 


F = ReLU(conv(F,k,,,) +5,) (3) 

F, = ReLU (conv(ReLU (conv(F , k) +b), Ka) +B.) (4) 
F, = ReLU (conv(ReLU (conv(F ,k,,,) +b), kss) + P33) (5) 
F, = ReLU (conv(MaxPool(F ,k,3), k) +b,) (6) 

F' =Concat(F,,, F,, F,, F,) (7) 


where Fi, F2, F3 and F4 are respectively the feature maps obtained after four branches, kixi is the 
convolution kernels of size ixi, and bi is the bias. 


Depthwise seperable convolution kernel [29] is a combination of two types of convolution 
kernels, including depthwise convolution kernel and pointwise convolution kernel. When the 
traditional convolution kernel convolves the image, channel and spatial information of the image are 
fused together. While depthwise separable convolution isolates the channel information from the 
spatial information and processes them separately in turn before fusion. 

The depthwise seperable convolution is performed by depth and point channel convolution in 
two steps. In the depthwise convolution part, the convolution operation 1s performed on each channel 


of the image, which uses a single-layer planar convolution kernel to obtain the result of the planar 
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convolution layer. In the pointwise convolution part, the result of the planar convolution layer 
operation is stitched, and then the convolution calculation is performed on the feature map using a 
1x1 point convolution kernel. 
The output feature map for conventional convolution assuming stride one and padding can be 
expressed by Equation (8): 
Gis = > Ke cy DOETE (8) 


i, j,m 
where G is the output feature map, F is the input feature map, K is the conventional convolution 
kernel, and the cost of conventional convolutions can be expressed by Equation (9): 


Cost = D; :Dg:'M -N -Dp D, (9) 


where Dx x Dx is the spatial dimension of the kernel, M is the number of input channels, N is the 
number of output channels, and Dr x Dr is the size of feature map. 
The output feature map of depthwise seperable convolution with the same parameters can be 
expressed by Equation (10): 
Cig) Rica Feats (10) 


Lj 
where G_ is the output feature map, F is the input feature map, K is the depthwise seperable 
convolution kernel, and the cost of depthwise seperable convolutions can be expressed by Equation 
(11): 

Cost = D, -D,-M-D,+D,+M-N-D,.-D, (11) 


which is the sum of the depthwise convolution and 1 x 1 pointwise convolution. By replacing 
traditional convolution with depthwise seperable convolution can reduce the size and computational 
effort of the network model, and the value can be expressed by Equation (12): 
Cost _ Dy+Dg-M-+D,-D,+M-N-D,-D, 
Cost D,:-D,:-M-N-D,-D,; 


1 
T. (12) 


K 


2.3 Improved DS-Inception block 


In this paper, the depthwise seperable convolution is fused into the inception block, and the 
traditional convolution kernel is replaced by the depthwise seperable convolution kernel. The 
improved convolution block is called as DS-Inception, and the specific design structure is shown in 
Appendix 2. The feature maps F' obtained after DS-Inception block can be mathematically expressed 
by Equation (13)-(18): 

F = ReLU(conv(F ,k,,,) +5,) (13) 


F, = ReLU(Dw(ReLU (conv(F ,, k) +b, ), k,.3)) (14) 
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F, = ReLU(Dw(ReLU (conv(F , k) +B; ),Ks,5)) (15) 


F, = ReLU (conv(MaxPool(F ,k;,;), ka) +04) (16) 
F, = ReLU (conv(F,k,,,) +s) (17) 
F' = Add(Concat(F,,, F,, F;, F,), Fs) (18) 


where F1, F2, F3, F4 and F5 are respectively the feature maps obtained from the five branches, kixi is 
the convolution kernels of size ixi, and biis the bias. 


The depthwise seperable convolution substitution is carried out for the large convolution kernel 
in inception block, and the residual structure is introduced into this block. Many studies have shown 
that the residual structure [30] can effectively restrain the gradient dispersion of the network and 
accelerate the convergence speed of the model. The parameters of the improved inception block are 
greatly reduced compared with the original structure block. The number of channels of each branch 
convolution kernel is K and the number of channels of the input layer is N; so, the number of 
parameters of the original block is 4NK + 34K? and the number for the improved block is 4NK + 34K 
+ 2K’, which is 32K? — 34K fewer than the original block. 


2.4 Improved multiscale neural network 


In this paper, referring to VGG-16 architecture, a multiscale convolutional neural network model 
based on the DS-Inception block is designed, called as DSR-inception network (see Figure 1). The 
proposed network mainly includes pooling module and convolution module which consists of DS- 
inception block, relu activation function and batch normalization. The five traditional convolutional 
modules of VGG-16 are replaced with the proposed convolutional modules to significantly reduce 
the number of model parameters. The ReLU activation function is assigned to neurons in all 
convolutional and fully connected layer, whereas the Sigmoid activation function is applied to 
neurons in the last layer for outputting the classification results. There is a max-pooling layer with a 
size of 2x2 and a stride of 2 behind each continuous convolutional layer to aggregate the transmitted 
information. The number of filters in the network increases as its depth increases, allowing for the 


learning of more detailed information from the input image. 
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Figure 1. Framework of DSR-Inception network. 

As it is the processing flow of the designed network model is illustrated in Figure 2. First, the 
face image of size 224x224x3 is input to the network through two branches. One branch passes 
through DS-inception block, each branch of this convolution block uses 16 convolution kernels for 
convolution operation. After passing through the module, the feature map of size 22422464 is 
output. The other branch adjusts the number of channels through the residual block. The 
corresponding element values of the feature map of the two branches are summed, and then the feature 
map is input into relu activation function, batch normalization and pooling layer. After five times of 
the above operation, the face features in the image are fully extracted, and the feature map of size 
7x7x1024 is obtained. Next, the feature map is input into global average pooling layer and dropout 
layer, and recognized result is output by the sigmoid classifier. 
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Figure 2. Processing flow of DSR-Inception network. 


The designed multiscale neural network contains convolutional kernels of different sizes to 
extract features of different dimensions, so the problem about low accuracy in facial feature extracted 
with a single scale network can be solved. Moreover, the residual structure is added to accelerate the 


convergence speed of network training and prevent overfitting. 


2.5 Joint supervised loss function 


The essence of the optimization process in the classification problem is the process of 
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minimizing the objective function. Softmax loss function is commonly used in the problem of image 
multiclassification, which is defined as shown in Equation (19) below. 


m 


1 e 
L= lo 
, = oa ee (19) 


Fg 
W; X +b; 


j=l 
Softmax loss function has good differentiability but lacks discriminative power. The redundancy 
of inter-class features is high in face recognition tasks, and it may lead to a greater difference between 
faces of the same person. 
Center loss function [31] is a clustering algorithm that causes each class to cluster to a center, 


which is equivalent to attaching a strong constraint to each class. It is defined in Equation (20). 
1 m 
L, =— 


The joint supervised loss function consists of softmax loss function and center loss function, 


2 


| (20) 


|x, =C; 


which is a powerful tool used in facial recognition technology for optimizing classification accuracy 
and feature center clustering simultaneously, and the schematic of the function is shown in Appendix 
3. The softmax loss function aims to maximize the accuracy of face classification by minimizing the 
difference between the model's output probability distribution and the true label. On the other hand, 
the center loss function seeks to cluster feature vectors of the same category close to a center point 
while separating those of different categories to enhance the identification of identity information. 
Furthermore, the center loss function incorporates a weight parameter à, which helps regulate the 


influence and update speed of the center point. Equation (21) shows the joint supervised loss function. 


T+ 
Wy; xitdy; m 


m A 2 
L=L +AL, =-) log =i Dif 6, l 21) 
= re = 


j=l 
where m, n are respectively the number of samples and categories, x; is the feature of image, yi is the 
label of category, W; is the weight of fully connected layer, b is the bias, cyi is the center of the 


classification, 4 is the equilibrium coefficient and 4 € (0, 0.1), and the value of å is 0.01 in this paper. 


3. Experimental results and analysis 
3.1 Experimental platform 


The model is trained in the GPU environment, and the environment configuration is shown in 


Table 1 beow. 


Table 1. Training environment configuration. 


Name Parameter 
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CPU Intel Core 19-10980XE 


Hard Disc 2T 
GPU NVIDIA RTX A4000 
Memory 16G 
Deep learning framework TensorFlow2.6.0 
Operating system Window 10 
Programming language Python3.7 
Cuda 10.0 


3.2 Dataset 


The pre-training dataset selected in this paper is the CelebA face dataset, the face data in this 
dataset is given in terms of face attributes for classification, and each face picture is given with face 
frame marker points. Appendix 4 shows a protion of the face images in the dataset. 

The homemade miner face dataset is named MF dataset. The MF dataset contains 40 people with 
21 pictures each, including 7 pictures each of no coal ash obscuration, light coal ash obscuration and 
heavy coal ash obscuration, for a total of 840 pictures. Table 2 shows an example of face data for the 
same person. 


Table 2. Sample face data from the same person. 


Attribute  frontO°® ~—s right 15°  right45° left 15° left 45° up15° down 15° 


> 


without 


obscuration 


light 


obscuration 


heavy 


obscuration 


E 
9 


In this paper, stochastic gradient descent (SGD) is chosen as the optimizer of training, the 


3.3 Contrast experiment 


remaining hyperparameters, including batch size, initial learning rate ao, natural decay index p and 
epoch, are set to 64, 0.01, 0.05 and 150, respectively. Inceptionv!, VGG-16 and Resnet18 are selected 
as the comparison network models and trained on the CelebA face dataset. The accuracy and loss of 
models during training are shown in Appendix 5, which shows that the proposed DSR-inception 
network has better advantages in terms of convergence speed and correctness compared with other 


network models. 
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Table 3. Design of different transfer learning schemes. 


Number Transfer strategy Train sample Test sample 
a without retraining 
b freeze the weight of Conv1 and retrain the rest 
c freeze the weight of Conv1~2 and retrain the rest 
14000 4000 
d freeze the weight of Conv1~3 and retrain the rest 
e freeze the weight of Conv1~4 and retrain the rest 
f freeze the weight of all convolution layer 


The specific transfer strategy is shown in Table 3 and the comparison data after the experiments 
are shown in Appendix 6. The effect of transfer learning is best when the module weights of Conv1~3 
are frozen and the rest are retrained. 

The CelebA face dataset is selected for model pre-training, of which 14,000 are used for training 
and 4,000 are used for testing. The specific experimental results are shown in Table 4. It can be seen 
that the proposed model has a higher accuracy and recall rate than other network models. Moreover, 
the size of the proposed model is only 46.56M, and the average time spent for testing each face image 


is 258ms. 


Table 4. Experimental comparison of different network. 


Metrics 
Precision Recall Fl-score Memory (M) Test time (ms/pic) 


Inceptionv1 95.34% 90.86% 93.05% 189 547 
VGG-16 91.53% 88.39% 89.93% 526 625 
Resnet1 8 94.82% 89.73% 92.21% 246 443 
Proposed 97.26% 94.17% 95.42% 46.56 258 


The MF dataset is selected for model fine-tuning, and introducing the improved loss function as 
a variable is tested for comparison. The experimental results are shown in Appendix 7. From the 
average results of 20 groups of experiments, it can be seen that the Fl-score of the model with joint 
supervised loss function is about 0.5%~1.5% higher than before, which verifies the effectiveness of 
joint supervised loss function. 

This paper also compared the proposed model with some new mainstream methods on CelebA 
face dataset, and the experimental results are shown in Table 5. 

Table 5 Comparison between proposed model and state-of-the-art methods. 


Model Accuracy (%) 


SNNBER [32] 93.54 
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OPFaceNet [33] 96.83 


IFDM [34] 95.64 
LSCSR [35] 96.50 
MER [36] 96.78 
MMPCANet [37] 97.18 
Proposed 97.26 


As can be seen from Table 5, the performance of the method presented in this paper exceeds that 
of most existing methods. Compared with the other methods, the detection accuracy of the proposed 
method is higher, indicating that the detection accuracy of the occluded face can be improved 


effectively by fusing the multi-scale features of the face. 


4. Application of proposed approach in underground coal mine 


In order to further verify the feasibility of the proposed face recognition system in underground 
coal mines, the face recognition system is set up in the tunneling tunnel. The mining monitoring 
camera KBA18W is selected to obtain images, and the signal is transmitted through the underground 


wireless router. The LED light source is used as auxiliary lighting equipment and the industrial test 
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site equipment is set up as shown in Figure 3(a). 
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(a) Industrial test scene 
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Setting Recognition result 


Open Data: 2022.12.24 
Time: 09:16:28 


Dataset 
Number: 01 


Identity: XW-05 


Quit 


Face recognition system in coul mine 


(b) Running interface of face recognition system 
Figure 3. Field testing. 


Before the industrial test, the face data of the staff members, a total of 6 people, is registered into 
the system. The system display interface is shown in Figure 3(b). 

A 60-hour working period is selected to verify the feasibility of the system, and the recognition 
results of the face recognition system are counted to verify the recognition rate of the system. The 


system recognition results are shown in Table 6. 
Table 6. Industrial test result. 


XW-01 XW-02 XW-03 XW-04 XW-05 XW-06 Average 


Number of 
296 320 453 416 380 374 - 
faces 
Precision 89.73% 90.94% 91.34% 92.86% 91.04% 92.16% 91.34% 
Loss 6.42% 4.84% 3.81% 3.57% 3.92% 3.12% 4.28% 


The average recognition accuracy of the proposed face recognition system in the dugout tunnel 
of coal mine reaches 91.34% and the miss detection rate is less than 5%. The results of the industrial 


test show that the system meets the design requirements. 
5. Conclusions and future work 


In this paper, an improved face recognition method for underground coal mine based on transfer 
learning is proposed to solve the problem of random coal dust obscuration. A novel DS-inception 
block is designed to reduce the amount of model parameters, and a joint supervised loss function 
based on center loss and softmax loss is proposed to adapt to face recognition classification task. 
Compared with the other classical models such as Inceptionvl, VGG-16 and Resnet-18, the various 


evaluation indicators of the proposed multiscale neural network, including accuracy, recall and F1- 
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score, achieve 97.26%, 94.17% and 95.42%, respectively. In order to better adapt to the face 
recognition task in coal mine, a miner face dataset is made to fine-tune the transfer model, and the 
transfer model incorporating joint supervised loss can effectively improve the face recognition 
accuracy by about 0.5~1.5%. In addition, the average recognition accuracy of the designed face 
recognition system in the dugout tunnel of coal mine reaches 91.34%, and the miss detection rate is 
less than 5%. 

This paper verifies the effectiveness of the proposed face recognition method in underground 
coal mine. However, this work has not yet considered the face recognition under the change of large 


angle posture, and further research on face recognition under different postures is needed in the future. 
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